Using R to Orchestrate APIs

A presentation for Research Data at the Edge, Day One of Duke Research Computing Symposium

Hosted by the Data & Visualization Services Department.

The Files

The presentation materials were composed in Rmarkdown via Rstudio, stored in a Github Repository, Slides & Notebook served via Github Pages.

Outline

Why?

The Web has lots of stuff

  • frontier beyond curated datasets
  • stuff is wrapped in HTML
  • HTML is transported over HTTP but composed for h2m consumption
  • Intellectual Property rights bear serious consideration

API

Application Program Interface

  • Built for machine-to-machine interactions
  • Instructions for programs

Client / Server

  • Make [R] interface with the web
  • Same as h2m but now m2m

Human Simulation

A dramatization…

  • Person uses Web Client
    • Person enters a URL

    • client & server negotiate
      dramatization: good handshake
    • Information is sent back in wrapped HTML
    • Web Browser parses the HTML

m2m – development

dramatization: confused about the protocol

dramatization: confused about the protocol

JSON

# from https://en.wikipedia.org/wiki/JSON
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    },
    {
      "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
  "spouse": null
}

Example

To Follow Along

  1. Open an RStudio Docker Container - https://vm-manage.oit.duke.edu/containers/rstudio
  2. Project > New Project
  3. Version Contrl > Git
  4. Repository URL = https://github.com/libjohn/r-api-json.git > Create Project
  5. Open API-JSON-Symposium.Rmd file
    • Run All
    • GoTo Line 150-ish (“### Demonstration”)

Demonstration

library(jsonlite)
# https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
# for building tibbles
library(tidyverse)

Single JSON array

When the server response is a single JSON array, JSONlite makes viewing the data pretty simple.

oneJSONresult <- fromJSON("http://www.omdbapi.com/?t=rocky&y=&plot=full&r=json")

Let’s see the results in the next slide


oneJSONresult
$Title
[1] "Rocky"

$Year
[1] "1976"

$Rated
[1] "PG"

$Released
[1] "03 Dec 1976"

$Runtime
[1] "120 min"

$Genre
[1] "Drama, Sport"

$Director
[1] "John G. Avildsen"

$Writer
[1] "Sylvester Stallone"

$Actors
[1] "Sylvester Stallone, Talia Shire, Burt Young, Carl Weathers"

$Plot
[1] "Rocky Balboa is a struggling boxer trying to make the big time, working as a debt collector for a pittance. When heavyweight champion Apollo Creed visits Philadelphia, his managers want to set up an exhibition match between Creed and a struggling boxer, touting the fight as a chance for a \"nobody\" to become a \"somebody\". The match is supposed to be easily won by Creed, but someone forgot to tell Rocky, who sees this as his only shot at the big time."

$Language
[1] "English"

$Country
[1] "USA"

$Awards
[1] "Won 3 Oscars. Another 16 wins & 21 nominations."

$Poster
[1] "https://images-na.ssl-images-amazon.com/images/M/MV5BMTY5MDMzODUyOF5BMl5BanBnXkFtZTcwMTQ3NTMyNA@@._V1_SX300.jpg"

$Metascore
[1] "N/A"

$imdbRating
[1] "8.1"

$imdbVotes
[1] "387,927"

$imdbID
[1] "tt0075148"

$Type
[1] "movie"

$Response
[1] "True"

The vector object behaves as you would expect in R.
  • You can list all the variable names.
names(oneJSONresult)
 [1] "Title"      "Year"       "Rated"      "Released"   "Runtime"    "Genre"      "Director"   "Writer"     "Actors"    
[10] "Plot"       "Language"   "Country"    "Awards"     "Poster"     "Metascore"  "imdbRating" "imdbVotes"  "imdbID"    
[19] "Type"       "Response"  
  • List an individual element
oneJSONresult$Title
[1] "Rocky"
oneJSONresult$Awards
[1] "Won 3 Oscars. Another 16 wins & 21 nominations."

A JSON Matrix

The results of this code-snippet react differently between the console, the Notebook script (console), and the Notebook HTML output. In the Notebook script-output you can find the component name, in this case dollar-search: $Search. Or, you can use bracket notation: [[1]]. Once you identify the component name, it’s easier to identify the element names.

jsonSeriesResutlsMatrix <- fromJSON("http://www.omdbapi.com/?s=rocky&type=series&r=json&page=1")
jsonSeriesResutlsMatrix
$Search

$totalResults
[1] "20"

$Response
[1] "True"

Call the search results and coerce the JSON array into a data frame.

jsonSeriesResutlsMatrix$Search

jsonSeriesResutlsMatrix$Search$Title
 [1] "Rocky and His Friends"         "Dr. Jeff: Rocky Mountain Vet"  "Rocky Jones, Space Ranger"     "Rocky Mountain Law"           
 [5] "Rocky King, Detective"         "Rocky Road"                    "Rocky Mountain Bounty Hunters" "Rocky + Drago"                
 [9] "Rocky Point"                   "Rocky Star"                   

Resources

---
title: "Using R to Orchestrate APIs"
author: "John Little"
date: '`r Sys.Date()`'
output:
  slidy_presentation: default
  html_notebook: default
---
## Using R to Orchestrate APIs

A presentation for [Research Data at the Edge](http://library.duke.edu/edge/events/rc17), Day One of [Duke Research Computing Symposium](https://rc.duke.edu/symposium-2017/)

Hosted by the [Data & Visualization Services](http://library.duke.edu/data/) Department.  

### The Files
- github Repo -- https://github.com/libjohn/r-api-json 
- Slides -- https://libjohn.github.com/rcs2017/slides.html
- Notebook -- http://libjohn.github.io/rcs2017/notebook.html 

The presentation materials were composed in *Rmarkdown* via *Rstudio*, stored in a *Github Repository*, Slides & Notebook served via *Github Pages*.  



## Outline

* API
* JSON
* R / RStudio

## Why?

### The Web has lots of stuff
+ frontier beyond curated datasets
+ stuff is wrapped in HTML
+ HTML is transported over HTTP but composed for h2m consumption
+ Intellectual Property rights bear serious consideration

<!-- NASA animated GIF ///  http://i.giphy.com/l2Jht4lIfEQfJ3zj2.gif    --> 
<!--  good human handshake ///  http://giphy.com/gifs/thomas-U2XboRuN89Idi -->
<!-- after the research handshake is complete /// http://giphy.com/gifs/80s-1980s-thomas-dolby-wCKmBd7oNtA4g  --> 
<!-- the confusion of the m2m handshake ///   http://giphy.com/gifs/thomas-MjkCYjM46NrrO -->

## API

### Application Program Interface 

* Built for machine-to-machine interactions
* Instructions for programs

<!-- http://mobile-gps.net/2015/01/ -->
![](images/api.png)


---    

### Client / Server 


![](images/Client-server-model.svg.png) 

* Make [R] interface with the web
* Same as h2m but now m2m


<!-- https://pixabay.com/en/client-server-networking-laptop-341420/ -->
---  

### Human Simulation

#### A dramatization...

* Person uses Web Client
    + Person enters a URL<br>
    ![](images/URL.PNG)
    
    + client & server negotiate<br> 
    ![dramatization: good handshake](images/good-handshake.gif) 
    + Information is sent back in wrapped HTML
    + Web Browser parses the HTML 
    
<!-- https://commons.wikimedia.org/wiki/File:Uniform_Resource_Locator_(URL)_example.PNG -->
<!-- https://commons.wikimedia.org/wiki/File:HTML.svg -->

## m2m -- development


![dramatization: confused about the protocol](images/development-confusion.gif)
    
## JSON

* [Javascript Object Notation](https://en.wikipedia.org/wiki/JSON) is a language-independent data format
* Currently the most common data data format for asynchronous client/server communication format
* Consists of key-value pairs

<!-- http://i.vimeocdn.com/video/541935816_1280x720.jpg -->
<!-- Vimeo on What is JSON // https://vimeo.com/144162102 -->


```{json example}
# from https://en.wikipedia.org/wiki/JSON
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    },
    {
      "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
  "spouse": null
}
```


## Example

### To Follow Along
1. Open an RStudio Docker Container - https://vm-manage.oit.duke.edu/containers/rstudio 
2. Project > New Project
3. Version Contrl > Git 
4. Repository URL = https://github.com/libjohn/r-api-json.git > Create Project 
5. Open *API-JSON-Symposium.Rmd* file
    + Run All
    + GoTo Line 150-ish ("### Demonstration") 

--- 

### OMDB api 

- http://www.omdb.org/
    - like http://imdb.com/
- no API keys requried
- http://www.omdbapi.com/

--- 

### Demonstration


```{r load-library-package, message=FALSE, warning=TRUE}
library(jsonlite)
# https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html

# for building tibbles
library(tidyverse)
```


### Single JSON array
When the server response is a single JSON array, JSONlite makes viewing the data pretty simple.
```{r singleJSONresult}
oneJSONresult <- fromJSON("http://www.omdbapi.com/?t=rocky&y=&plot=full&r=json")
```

Let's see the results in the next slide

---

```{r}
oneJSONresult
```


--- 

##### The vector object behaves as you would expect in R.  

- You can list all the variable names.

```{r}
names(oneJSONresult)
```

- List an individual element


```{r}
oneJSONresult$Title
```

```{r}
oneJSONresult$Awards
```


---

### A JSON Matrix
The **results of this code-snippet react differently** between the *console*, the *Notebook script* (console), and the *Notebook HTML* output.  In the Notebook script-output you can find the component name, in this case dollar-search: `$Search`.  Or, you can use bracket notation: `[[1]]`.  Once you identify the component name, it's easier to identify the element names.
```{r}
jsonSeriesResutlsMatrix <- fromJSON("http://www.omdbapi.com/?s=rocky&type=series&r=json&page=1")
jsonSeriesResutlsMatrix
```

---  

### Call the search results and coerce the JSON array into a data frame.
```{r}
jsonSeriesResutlsMatrix$Search
```

--- 
```{r}
jsonSeriesResutlsMatrix$Search$Title
```


## R Packages -- Related

*People who use JSONlite also use...*

* [httR](https://cran.r-project.org/web/packages/httr/) -- calls JSONlite in service to major goal of orchestrating HTTP (web scraping)
* [rvest](https://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/) --  used for HTML parsing

## Resources 

- RStudio httR video
- JSONlite package
- listof images
- Movies of 1976
    - [OMDB Top Movies](http://www.omdb.org/encyclopedia/year/1976/statistics)
    - [IMDB Most Popular](http://www.imdb.com/year/1976/)

